Journal of Vision
● Association for Research in Vision and Ophthalmology (ARVO)
Preprints posted in the last 90 days, ranked by how well they match Journal of Vision's content profile, based on 92 papers previously published here. The average preprint has a 0.03% match score for this journal, so anything above that is already an above-average fit.
Duay, K.; Nagai, T.
Show abstract
Realism and naturalness remain unresolved questions in vision science. This study investigates whether the physical gamut correlates with realism judgements. We conducted psychophysical experiments where observers judged the realism of natural scenes with target regions manipulated across the CIE 1931 color space. Results initially showed a moderate-to-strong correlation between judgements and a theoretical physical gamut derived from optimal colors. Further analysis revealed that the most detrimental points were in the saturated green region of the CIE 1931 xy chromaticity diagram; removing them yielded a very strong correlation. To explain this discrepancy, we modeled a real-world physical gamut based on USGS and ECOSTRESS spectral libraries. The analysis revealed that the detrimental green chromaticities might be non-existent in the real-world. Since physical gamut theory posits that the visual system constructs internal references through empirical observation of the world, the absence of these colors in nature might be a plausible explanation to the theoretical models failure. Ultimately, the real-world gamut exhibited an even stronger correlation with judgements, supporting our hypothesis while suggesting that the theoretical model may not be the optimal approximation of the actual physical gamut. These findings contribute to discussions on perceptual realism and offer a framework for enhancing rendering technologies.
Nakamura, A.; Luo, J.; Yokoi, I.; Takemura, H.
Show abstract
Visual perception of symbolic numerals is essential for everyday tasks; however, the neural and perceptual mechanisms underlying this ability remain unclear. Partially occluded digital numerals can elicit bistable perception, and adaptation to symbolic numerals alters the perception of these ambiguous stimuli. We aimed to examine how symbolic numeral adaptation is related to hierarchical visual processing by testing its interocular and interhemifield transfer. Experiment 1 tested interocular transfer by presenting the test stimulus to either the same or opposite eye as the adaptation stimulus. Experiment 2 assessed interhemifield transfer by presenting the test stimulus to either the same or opposite hemifield as the adaptation stimulus. Experiment 3 examined the interhemifield transfer of adaptation confined to the upper parts of digital numerals. Our results showed that adaptation to digital numerals induced shifted perceptual interpretations that transferred across eyes. In addition, we found that adaptation to digital numerals induced a relatively small but statistically significant interhemifield transfer. In contrast, adaptation restricted to the upper parts of digital numerals showed no significant interhemifield transfer. These findings suggest that the perceptual interpretation of symbolic numerals involves visual processing stages that integrate information across the eyes and hemifields.
Tailor-Hamblin, V. K.; Theodorou, M.; Dahlmann-Noor, A.; Dekker, T. M.; Greenwood, J. A.
Show abstract
PurposeFoveal vision in individuals with albinism is impaired not only by reduced visual acuity but also by elevated crowding - the disruption of object recognition in clutter. Because albinism is characterised by both retinal underdevelopment and nystagmus (uncontrolled eye movements), it is unclear whether crowding is elevated primarily from image motion due to eye movements or an additional sensory deficit. To disentangle these factors, we examined the spatial and featural selectivity of foveal crowding in albinism, comparing performance with controls and prior data from individuals with idiopathic infantile nystagmus syndrome (IINS), where nystagmus occurs without retinal underdevelopment. MethodsAdults with albinism (n=8) and age-matched controls (n=8; 19-49 years) identified the orientation of foveal Landolt-C targets. In Experiment 1, targets were presented alone or flanked horizontally or vertically to assess spatial selectivity. In Experiment 2, flankers were of the same or opposite contrast polarity to assess featural selectivity. Stimulus size was adaptively scaled using QUEST to estimate gap-size thresholds. ResultsCrowding was substantially elevated in albinism, relative to both controls and IINS. Experiment 1 revealed stronger crowding for horizontally than vertically positioned flankers in albinism, mirroring the predominant direction of nystagmic eye movements. In Experiment 2, opposite-polarity flankers did not reduce crowding, indicating an absence of selectivity for target-flanker similarity. ConclusionsFoveal crowding in albinism is markedly elevated, with a nystagmus-related spatial anisotropy and a lack of featural selectivity. These characteristics suggest that these elevations reflect both retinal image motion and a substantial sensory deficit arising from abnormal visual development.
Cerpelloni, F.; Collignon, O.; Op de Beeck, H.
Show abstract
The human visual system, and in particular the Visual Word Form Area (VWFA), adapts to process letters and words, even when the stimuli do not share canonical script features, like Braille. Here we set-up to compare the organization of typical orthographic and peculiar visual scripts such as Braille in computational models. In a first experiment, we looked at how Braille letters are represented in an illiterate Convolutional Neural Network (AlexNet) and compared them to Latin alphabet and to Line Braille, a custom line-based script. We observed a predisposition of the network, pre-trained to perform object recognition, for line-based scripts. This finding suggests an initial advantage of line junctions over Braille in processing scripts likely based on typical visual computations applied to the visual world. In a second experiment, we trained two benchmark neural network architectures (AlexNet, CORnet Z) to classify words in the Latin script (literacy acquisition) and then in the Braille script (expertise acquisition). We modelled the processing of reading visual Braille and explored the networks representations at different layers. We observed clustering of features based on the visual properties of the scripts and not by the networks expertise. Unlike human participants, the representations of linguistic categories do not converge to a model of the linguistic (orthographic, phonological, semantic) properties. Overall, the lack of alignment between the visual processing of the trained computational models and neural data recorded in expert humans suggests that the fundamental processing of reading cannot be fully explained by simple feed-forward visual processing of the script, but likely relies on additional mechanisms including interactive relations between the visual and linguistic systems.
Engeser, M.; Babaei, N.; Kaiser, D.
Show abstract
Each individual person looks at natural scenes in their own unique way, resulting in a distinct perceptual experience of the world. However, little is known about why such differences in gaze emerge. Here, we test the hypothesis that idiosyncrasies in gaze behavior are predicted by inter-subject variations in internal models--expectations about how scenes typically look. In two experiments, we first characterized participants personal internal models by asking them to draw typical bathroom and kitchen scenes. Individual differences in these drawings were quantified using an objective deep learning pipeline and, in turn, related to individual differences in gaze behavior. In Experiment 1, where participants freely viewed a set of kitchen and bathroom photographs, inter-subject similarities in internal models did not predict inter-subject similarities in gaze. In Experiment 2, we encouraged strategic exploration through gaze-contingent viewing and a memory task. Here, inter-subject similarities in internal models predicted similarities in fixation frequency and the sequence in which different object categories were inspected. These findings suggest that the influence of internal models on visual exploration is stronger under increased sensory uncertainty and when expectation-guided sampling of the environment is encouraged. Together, our results provide new insights into how individual expectations shape gaze behavior and help explain why people differ in how they explore the visual world.
Shurygina, O.; Wirth, L. A.; Rolfs, M.; Ohl, S.
Show abstract
Saccades made during memory maintenance prioritize memory for the saccade target, but it is unclear if this benefit is specific to a location or extends across memorized objects. In three experiments, we examined whether saccadic selection spreads to other locations within the same object. In Experiment 1, we asked observers to remember three oriented Gabors presented either within contour-defined objects or without object structure. A subsequent movement cue prompted observers to move their eyes to the indicated location. We then probed memory for stimuli at locations equidistant from the saccade target, in either the same or a different object. Memory was best for stimuli at locations congruent with the saccade target, and consistently weaker for other stimuli presented in the same or a different object than the saccade target. In Experiment 2, we created more complex objects by adding more object features to the stimulus. Again, memory performance was best for stimuli congruent with the saccade target location, whereas memory in incongruent trials was worse and similar for stimuli in the same and different object as the saccade target. In Experiment 3, we tested if saccadic selection is present and propagates within the object in a change detection task. Again, memory performance (i.e., change detection) was best at the saccade target location. However, this memory benefit also spread to other locations within the same object. Our results imply that saccadic selection in visual working memory is primarily space-based but can also spread towards locations within the object where a saccade was directed.
Lipsky, T.; Ehrenzeller, C.; Ansari, G.; Pfau, K.; Harmening, W.; Wu, Z.; Pfau, M.
Show abstract
Purpose: To quantify whether fundus tracking in microperimetry improves psychometric parameter estimation (in vivo demonstration of improved stimulus-delivery precision), and to derive a psychometrically grounded criterion intensity for suprathreshold (defect-mapping) microperimetry. Methods: Twenty-five healthy volunteers underwent MAIA2-microperimetry at five loci: three outside and two inside the blind spot. Frequency-of-seeing (FoS) functions were measured in four blocks (2 tracking on; 2 tracking off). FoS-data were fit using cumulative-Gaussian psychometric functions estimating sensitivity parameters. Mixed-effect models assessed tracking effects, and posterior simulations defined the optimal criterion intensity for separating 'seeing' from 'non-seeing' loci. Results: Tracking had little effect on threshold estimates at loci outside the blind spot, but lowered threshold estimates within the blind spot (posterior median difference PMD [95% CrI] of -1.46 dB [-2.30, -0.62] at locus 4, and -1.02 dB [-1.94, -0.08] at locus 5). Tracking was associated with steeper psychometric slope parameters at loci 1-3 (PMD of -0.14 dB [-0.29, 0.01], -0.27 dB [-0.43, -0.12], and -0.22 dB [-0.40, -0.04]). Without tracking, false-positive responses were more frequent when fixation shifts displaced stimuli toward the 'seeing' retina. Simulation-based analysis identified 13 dB as nominally optimal criterion for suprathreshold microperimetry (Youden index: 0.76 [0.74, 0.79], comparable to 10 dB (0.74 [0.72, 0.76]). Conclusions: Even in healthy volunteers with stable fixation, fundus tracking measurably reduced sensitivity estimates at 'non-seeing' loci and sharpened FoS curves in the 'seeing' retina. A criterion intensity of 10 to 13 dB is a defensible choice for separating 'seeing' and 'non-seeing' retina in suprathreshold (defect-mapping) perimetry paradigms.
Ekinci, M. A.; Buhlmann, N.; Kaiser, D.
Show abstract
Aesthetic experiences in everyday life unfold under continuously changing visual input. Although these experiences clearly depend on the observer and context, they are partly explained by the visual features of the input. Here, we investigated how well a combination of visual features predicts dynamic aesthetic experiences during naturalistic and artistic movie watching. In two experiments, participants continuously rated the aesthetic appeal of either the nature documentary Home or the animated art-style movie Loving Vincent. We modeled moment-to-moment ratings using image-computable visual features extracted from each movie frame, including visual fluency, color and motion statistics, and symmetry. Linear models trained on these features reliably predicted aesthetic ratings for new movie parts, both within and across observers, pointing to shared perceptual influences on aesthetic experiences. Model comparisons showed that visual fluency and color-related features were most informative for predicting aesthetic experience in both movies. Critically, models trained on one movie could reliably predict aesthetic appeal ratings in the other movie, despite the movies remarkably different content and styles. Color features were most informative for cross-movie prediction. We conclude that visual features shape dynamic and naturalistic aesthetic experiences, and that the mapping of visual features onto aesthetic appeal is stable across observers and different movie content.
Koenderink, J.; van Doorn, A.; Braun, D. I.; Gegenfurtner, K. R.
Show abstract
A complete empirical characterization of color discrimination in three dimensions has long remained out of reach. Classical studies, beginning with MacAdams ellipses, provided local measurements in restricted chromatic planes, but a spatially dense and internally consistent mapping of discrimination structure across full color space has not yet been achieved. Here we present such a systematic three-dimensional measurement of color discrimination in RGB space. Eight observers measured discrimination regions at 35 reference colors distributed on a body-centered cubic lattice within the RGB cube. At each location, color differences were probed along seven orientations, yielding 14 directional extents. These measurements defined centrally symmetric convex regions that were fitted with minimum-volume ellipsoids, providing a compact description of local discrimination structure. Ellipsoids were represented as symmetric positive-definite matrices and analyzed using a Frobenius geometry, enabling normalization across observers and smooth interpolation to arbitrary locations. The resulting metric field is spatially smooth, highly structured, and remarkably consistent across observers up to an individual global scale factor. Grain size increases along the achromatic axis and exhibits systematic chromatic asymmetries. Comparison with CIEDE2000 reveals substantial agreement in overall scale variation but systematic differences in local anisotropy. Together, these data provide a coherent three-dimensional empirical mapping of color discrimination across RGB space and establish an empirical framework for perceptual color metrics.
Sun, H.; Birney, A.; Singh, N.; Olszko, A.; Chen, P.; Ke, J.; Rosenberg, M. D.; Jangraw, D. C.
Show abstract
Mind-wandering (MW) is a frequent and pervasive phenomenon, yet it is commonly assessed using self-reports or probe-based methods that offer limited temporal precision regarding its onset. In this study, we introduce a novel paradigm, ReMind, that estimates the onset and duration of MW episodes during natural reading by combining retrospective self-reports with eye-tracking. Participants indicated the words where they believed their mind started and stopped wandering, and these reports were aligned with gaze timestamps to estimate MW onset. Using data from 44 participants, we examined whether knowledge of MW onset improves the detection of MW from eye-tracking signals. To evaluate relevance for both self-report and thought-probe paradigms, we additionally simulated thought probes by randomly sampling time points during reading. Logistic regression classifiers trained on eye-tracking features extracted from time windows anchored to MW onset achieved AUROC scores of 0.659 and 0.621 under the self-report and simulated thought-probe paradigms, respectively, using leave-one-subject-out cross-validation. In both cases, onset-aligned windows outperformed classifiers trained using arbitrary MW windows. Sliding-window analyses further revealed systematic temporal changes around MW onset, with classification performance peaking at approximately 3 seconds after onset. Feature-level analyses showed reduced fixation rate and fixation dispersion, along with increased pupil size following MW onset. Together, these findings characterize the temporal progression from on-task reading to MW. Overall, ReMind provides a useful framework for studying the temporal dynamics of MW during naturalistic reading.
Altinordu, N.; Boynton, G. M.; Fine, I.
Show abstract
Color is a prominent feature of visual experience, yet humans can recognize objects easily and accurately from grayscale images. We examined whether color becomes more useful when spatial information is degraded due to blurring. Participants viewed naturalistic scenes in color or grayscale, and reported whether a named target object was present across a range of blur levels that simulated optical defocus from 0-8 diopters. With unblurred images, performance did not differ between color and grayscale conditions, but as blur increased, recognition accuracy declined. Color provided a modest but reliable advantage at higher levels of blur, suggesting that color becomes increasingly useful when optical quality is degraded. We hypothesize that the evolutionary shift towards trichromacy may have been partially driven by the need to compensate for optical degradation due to aging and/or accumulated light exposure.
Vanni, S.; Vedele, F.; Hokkanen, H.
Show abstract
The primate retina dissects visual scenes into multiple retinocortical streams. The most numerous retinal ganglion cell (GC) types, midget and parasol cells, are further divided into ON and OFF subtypes. These four GC populations have anatomical and physiological asymmetries, which are reflected in the spike trains received by downstream circuits. Computational models of the visual cortex, however, rarely take GC signal processing into account. We have built a macaque retina simulator with the aim of providing biologically plausible spike trains for downstream visual cortex simulations. The simulator is based on realistic sampling density and receptive field size as a function of eccentricity, as well as on two distinct spatial and three temporal receptive field models. Starting from data from literature and earlier receptive field measurements, we synthetize distributions for receptive field parameters, from which the synthetic units are sampled. The models are restricted for monocular and monochromatic stimuli and follow data from the temporal hemiretina which is more isotropic. We show that the model patches conform to anatomical data not used in the reconstruction process and characterize the responses with respect to spatial and temporal contrast sensitivity functions. This simulator allows starting from a stimulus video and provides biologically plausible spike trains for the distinct unit types. This supports development of thalamocortical primate model systems of vision. In addition, it can provide a reference for more biophysical retina models. The independent parameters are housed in text files supporting reparameterization for particular macaque data or other primate species. Author summaryVisual environment provides a rich source of information, and the visual system structure and function has been studied for decades in many species, including humans. The most complex data in mammalian species are processed in the cerebral cortex, but to date we are still missing a functioning model of cortical computations. While the earlier anatomical and physiological data describe many details of the visual system, to understand the functional logic we need to numerically simulate the complex interactions within this system. To pave the way for simulating visual cortex computations, we have developed a functioning model for macaque retina. The neuroinformatics comprises a review and re-digitized existing retina data from literature, as well as statistics of earlier macaque receptive field data. Finally, we provide software which brings the collected neuroinformatics to life and allows researchers to convert visual input into biologically feasible spike trains for simulation experiments of visual cortex.
Noerenberg, W.; Schweitzer, R.; Rolfs, M.
Show abstract
Saccadic eye movements sweep the visual scene across the retina, yet the resulting motion is rarely perceived. Visual factors alone, such as the presence of static pre- and post-saccadic images, can attenuate motion perception, suggesting a masking of the motion signal during early visual processing. Here, we isolated the visual component of this reduction in motion perception using simulated saccades presented to fixating observers. Across two experiments, we manipulated motion amplitude (6-18 dva), duration, and velocity profile and measured perceived amplitude and velocity at varying masking durations. Visual masking strongly reduced perceived motion amplitude and velocity, with short halftimes ([~]15 ms) that were largely invariant across saccade amplitudes. Critically, motion following a naturalistic saccadic velocity profile was perceived as smaller and slower than constant-velocity motion matched in amplitude and duration, even without explicit masking. This additional reduction increased with both amplitude and duration. These results show that visual mechanisms alone can account for substantial motion reduction across a large range of amplitudes and demonstrate a partially separable contribution of the saccadic velocity profile, suggesting that the temporal structure of retinal motion itself supports perceptual continuity across eye movements.
Horvath, G.; Rado, J.; Czigler, A.; Fülöp, D.; Sari, Z.; Kovacs, I.; Buzas, P.; Jando, G.
Show abstract
Binocular vision depends on the integration of matching visual features across the two eyes, while conflicting interocular signals can engage active inhibitory processes in the visual system. To investigate the temporal dynamics of these putative inhibitory processes, we examined how transitions between different binocular correlation states influence perceptual detectability and response speed. Using dynamic random-dot correlograms - free of monocular cues and allowing precise interocular manipulation - we presented brief target intervals embedded in longer background sequences. Stimuli varied in binocular correlation: correlated (C) patterns contained identical luminance profiles in both eyes, anticorrelated (A) patterns had inverted luminance dots, and uncorrelated (U) patterns had independent dot arrangements. Across three experiments, we measured (1) the presentation duration threshold required to detect a change in correlation, (2) simple reaction times (RTs) to the same transitions at suprathreshold levels, and (3) psychometric functions across durations for selected transitions. In Experiment 1, A[->]C transitions yielded significantly higher duration thresholds than C[->]A, indicating a suppressive influence associated with prior anticorrelation. In contrast, Experiment 2 showed that A[->]C transitions produced the shortest RTs, while C[->]U transitions were slowest, suggesting a rebound-like facilitation following prior suppression. Experiment 3 confirmed these temporal and contrast dependences, with opposite changes in contrast threshold and reaction times between transitions toward and away from the correlated fusional states. This divergence between perceptual onset and reaction time is consistent with a two-phase account in which binocular anticorrelation is associated with an initial suppressive phase followed by rebound-like facilitation that accelerates responses once the target becomes detectable. These findings are consistent with current models of binocular rivalry and fusion, and provide a temporally resolved behavioral perspective on how inhibitory control in sensory systems may dynamically influence subsequent responsiveness under conditions of perceptual ambiguity.
Khan, R.; Bekiari, S.; Hierck, B.; Salvatori, D.; Kenemans, L.
Show abstract
Mental rotation in 3D is a key cognitive skill involving dynamic spatial transformations, for which pronounced individual differences have been documented. Here we ask whether individual differences in 3D abilities can be explained by analogous differences in 2D abilities. 3D mental-rotation was assessed by the Vandenberg & Kruse Mental Rotation Test (3D-MRT) and examined for association with performance and underlying electrocortical mechanisms during a 2D letter rotation task. Participants (N=40) first completed the MRT and then performed a computerized 2-D letter rotation task in which they had to identify whether letters were oriented in a standard or a mirrored direction (parity judgment) when rotated at 0{degrees}, 60{degrees}, 120{degrees}, and 180{degrees} while EEG was recorded. Reaction times (RTs) and error rates increased with angular disparity. The angular disparity effect on RT was smaller for mirrored letters. Low, relative to high, 3D-MRT scoring participants showed more pronounced accuracy declines at higher rotation angles. An EEG Event Related Potential (ERP) known as the Rotation-Related Negativity (RRN) became more pronounced with increasing angular disparity. High 3D-MRT scores were associated with a stronger RRN response at central-parietal sites. In addition, the ERP-P3b wave was more pronounced at central-parietal sites for low 3D-MRT scorers, independent of angular disparity. It is concluded that 3D rotational ability is positively associated with 2D mental rotation performance, and more strongly with enhanced recruitment of neural visual-spatial cortical representations than with enhanced recruitment of more general cognitive resources.
Coggan, D. D.; Tong, F.
Show abstract
Human object recognition is robust to challenging conditions, such as when ones view of an object is fragmented due to an occluding foreground object. In comparison, deep neural networks (DNNs) are typically more susceptible to occlusion, suggesting that human vision relies on distinct mechanisms. Here, we investigated the role of visual diet in the emergence of these mechanisms by asking whether human-like robustness might arise in DNNs when trained with image datasets that better reflect the properties of occlusion in natural vision. We trained convolutional and transformer DNNs to classify clear images only, images augmented with artificial occluders (i.e., geometric shapes) or natural occluders (objects segmented from photographs). We then evaluated DNN occlusion robustness and compared their performance profiles with 30 human participants. We found that DNNs trained with artificial occluders remained vulnerable to natural occlusion and exhibited less human-like performance than those trained with natural occlusion. Our findings suggest that human robustness to visual occlusion arises from learning to disentangle natural objects from each other rather than simply learning to recognize objects from partial views. They also imply that commonly used forms of artificial occlusion are unsuitable for the evaluation or promotion of robustness to real-world occlusion in DNNs.
Linde-Domingo, J.; Ortiz-Tudela, J.; Voeller, J.; Hebart, M. N.; Gonzalez-Garcia, C.
Show abstract
Visual inputs during natural perception are highly ambiguous: objects are frequently occluded, lighting conditions vary, and object identification depends significantly on prior experiences. However, why do certain images remain unidentifiable while others can be recognized immediately, and what visual features drive subjective clarification? To address these critical questions, we developed a unique dataset of 1,854 ambiguous images and collected more than 100,000 ratings (from a total of 947 participants) evaluating their identifiability before and after seeing undistorted versions of the images. Relating the representations of a brain-inspired neural network model in response to our images with human ratings, we show that subjective identification depends largely on the extent to which higher-level visual features from the original images are preserved in their ambiguous counterparts. In line with these results, an image-level regression analysis showed that the subjective identification of ambiguous images was best explained by high-level visual dimensions. Notably, the predominance of higher-level features over lower-level ones softens after participants disambiguate the images, suggesting that the visual system flexibly shifts between top-down guessing to bottom-up matching after disambiguation. Moreover, we found that the process of ambiguity resolution was accompanied by a notable decrease in semantic distance and a greater consistency in object naming among participants. However, the relationship between information gained after disambiguation and subjective identification was non-linear, indicating that acquiring more information does not necessarily enhance subjective clarity. Instead, we observed a U-shaped relationship, suggesting that subjective identification improves when the acquired information either strongly matches or mismatches prior predictions. Collectively, these findings advance our understanding on how we resolve ambiguity and extract meaning from incomplete visual information.
Geisler, W. S.; Das, A.
Show abstract
The human visual system segments images using both high-level recognition mechanisms and low-level mechanisms that are largely independent of specific prior experience. The low-level mechanisms are essential for initiating recognition processes, and for learning to recognize new materials, objects, and contexts. Here we describe a hierarchical Bayesian observer (HBO) model of texture segmentation that is biologically plausible, takes into account the statistics of natural scenes, and does not depend on prior experience. The HBO model consists of five steps: local similarity grouping with local normalization, mutual similarity grouping (local grouping is strengthened if the neighboring regions are similar to the same set of other regions), transitive grouping (good continuation), confidence grouping (neighboring regions far from the same-different decision boundary guide grouping of regions near the decision boundary), and region grouping (similarity grouping of the regions from the initial segmentation). We find that a local similarity grouping process, trained to maximize accuracy, predicts human texture discrimination accuracy. We then find that the four additional steps accurately segment images with randomly shaped regions containing arbitrary natural textures. The success of the model depends on all the steps, but especially on local-similarity and transitive grouping. We also find that the transitive grouping allows correct segmentation of non-stationary texture regions (e.g., textures slanted in depth). Further, we find that when illumination varies across the image, local normalization enables both correct texture segmentation and estimation of illumination change. Finally, we find that unlike our model large state-of-the-art deep networks often fail on these stimuli.
Truong, N.; Noei, S.; Karami, A.
Show abstract
Convolutional neural networks (CNNs) have become essential models for predicting neural activity and behavior in visual tasks. However, their ability to capture higher-level cognitive functions, such as numerosity discrimination, remains debated. Numerosity, the ability to perceive and estimate the number of items in a visual scene, is often proposed to rely on specialized number-detector units within CNNs, analogous to number-selective neurons observed in the brain. In this study, we use CORnet, a biologically inspired CNN architecture inspired by the organization of the primate visual system. To address a limitation of classical Representational Similarity Analysis (RSA)--its assumption that all units contribute equally--we apply pruning, a feature selection approach that identifies the units most relevant for explaining behavioral similarity structure. Our results show that number-detector units are not critical for population-level representations of numerosity, challenging their proposed role in previous studies.
Yu, Y.; Hafed, Z. M.
Show abstract
Visual response strength in the primate superior colliculus (SC) has recently been shown to inversely correlate with trial-by-trial saccadic reaction time in a much stronger way than visual response strength in the primary visual cortex (V1). However, for any given visual stimulus onset, populations of neurons in each brain area are concurrently activated, leaving open the question of how V1 visual response strength can predict trial-by-trial saccadic reaction time when multiple simultaneously recorded neurons are taken into account. Using a classic visually-guided saccade task, here we assessed the quality of predicting trial-by-trial saccadic reaction time from the visual response strengths of 1 to 10 simultaneously recorded neurons in each brain area. For each session, we modeled saccadic reaction time as a weighted linear combination of the visual response strengths of N simultaneously recorded neurons. Consistent with the prior work, the visual response strength of a single SC neuron was better than that of a single V1 neuron at predicting reaction time. By adding more simultaneously recorded neurons, the prediction got much better in the SC, but not in V1.Only for 100% contrast dark stimuli (darker in luminance than the surrounding gray background) did V1 show an increase in prediction quality with more simultaneously recorded neurons. This increase, which was still substantially weaker than in the SC, could reflect the preference of V1 neurons for dark contrasts. These results suggest that despite qualitative similarities between SC and V1 visual responses, SC visual responses are functionally reformatted from their V1 counterparts. SignificanceThe superior colliculus (SC) is an important sensory-motor structure for controlling eye movements, and it receives a significant portion of its inputs directly from the primary visual cortex (V1). Despite this, SC visual responses are much better correlated with trial-by-trial variability in saccadic eye movement timing than V1 visual responses, and this effect is strongly amplified when considering simultaneously recorded neurons. Thus, SC and V1 visual responses serve fundamentally different functions from a motor perspective.